Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
نویسندگان
چکیده
منابع مشابه
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe sta...
متن کامل[hal-00829532, v2] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...
متن کامل[hal-00829532, v3] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...
متن کاملAbout upper bounds on the complexity of Policy Iteration∗
We consider Acyclic Unique Sink Orientations of the n-dimensional hyper-cube (AUSOs), that is, acyclic orientations of the edges of the hyper-cube such that any sub-cube has a unique vertex of maximal in-degree. We study the Policy Iteration (PI) algorithm, also known as Bottom-Antipodal or Switch-All, to nd the global sink: starting from an initial vertex π0, i = 0, the outgoing links at the p...
متن کاملthe effect of task complexity on lexical complexity and grammatical accuracy of efl learners’ argumentative writing
بر اساس فرضیه شناخت رابینسون (2001 و 2003 و 2005) و مدل ظرفیت توجه محدود اسکهان (1998)، این تحقیق تاثیر پیچیدگی تکلیف را بر پیچیدگی واژگان و صحت گرامری نوشتار مباحثه ای 60 نفر از دانشجویان زبان انگلیسی بررسی کرد. میزان پیچیدگی تکلیف از طریق فاکتورهای پراکندگی-منابع تعیین شد. همه ی شرکت کنندگان به صورت نیمه تصادفی به یکی از سه گروه: (1) گروه موضوع، (2) گروه موضوع + اندیشه و (3) گروه موضوع + اندی...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2016
ISSN: 0364-765X,1526-5471
DOI: 10.1287/moor.2015.0753